Publicis Sapient | Data Engineer Interview Experience | 6 YOE

  • 17-Jul-2025
  • 5 mins read


Round 1: Screening Test (1 Hour – PySpark Coding)

Environment: Coding done in a virtual lab provided by the company.

Task: Complex data transformation using PySpark.

Difficulty: High-level — required chaining multiple transformations to reach the expected output.

Skills Tested:

DataFrame operations

Joins, window functions

Handling nested structures, nulls, and schema enforcement

Round 2: Technical + Project Discussion (Face-to-Face)

SQL (5 Questions – Hard Level)

Advanced SQL involving:

Multiple joins

Window functions (LAG, LEAD, NTILE)

CTEs and nested queries

Aggregations with filtering

Project Discussion

Deep dive into past projects:

Architecture

Tooling (e.g., Spark, Delta Lake, Azure/AWS)

Your role in data ingestion, transformation, and performance tuning

PySpark (4 Coding Questions)

Real-world data manipulation using:

groupBy, agg, window

Conditional logic with when, otherwise

Handling nulls and schema mismatches

Spark Optimization Techniques

Tuning Spark configurations for performance

Round 3: HR

Salary discussion, location, etc.